{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "BERT 对数据处理之后，读取出来进行清洗，处理成用户所期待的形式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from static import BERT_INFER_RES, TOTAL_CSV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "sys.path.append(\"..\")\n",
    "\n",
    "from utils import load_obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from settings import raw_attr, get_large_cls_by_name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导入BERT 推理结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "bert_infer_data = load_obj(BERT_INFER_RES)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2499748"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(bert_infer_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 推理的原始数据\n",
    "from BERT_infer_data import BERT_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2499748"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(BERT_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "武汉兴祥农业科技发展有限公司:\n",
      "经营范围:从事种养殖业及林果花卉;农业高科技的研究;开发;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:谷物种植;\n",
      "生物农业\n",
      "\n",
      "湖北维绿家环保科贸有限公司:\n",
      "经营范围:测绘服务;建筑劳务分包;环保咨询服务;水土流失防治服务;水利相关咨询服务;水文服务;水资源管理;节能管理服务;机械设备租赁;技术服务;技术开发;技术咨询;技术交流;技术转让;技术推广;环境保护专用设备销售;园林绿化工程施工;工程造价咨询业务;信息咨询服务;广告设计;代理;图文设计制作;凭营业执照依法自主开展经营活动;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:棉、麻、糖、烟草种植;\n",
      "先进环保\n",
      "\n",
      "湖北金元盛田农林科技有限公司:\n",
      "经营范围:中草药种植;茶叶种植;水果种植;技术服务;技术开发;技术咨询;技术交流;技术转让;技术推广;鲜肉零售;土地使用权租赁;咨询策划服务;农副产品销售;生物有机肥料研发;复合微生物肥料研发;土壤与肥料的复混加工;肥料销售;化肥销售;工程和技术研究和试验发展;农业专业及辅助性活动;智能农业管理;农业园艺服务;与农业生产经营有关的技术;农业生产托管服务;花卉种植;花卉种植;礼品花卉销售;树木种植经营;肥料生产;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:蔬菜、食用菌及园艺作物种植;\n",
      "生物农业\n",
      "\n",
      "武汉青熠田农业有限公司:\n",
      "经营范围:农业专业及辅助性活动;园艺产品销售;农业园艺服务;互联网销售;计算机软硬件及辅助设备批发;计算机软硬件及辅助设备零售;汽车装饰用品销售;办公用品销售;家居用品制造;针纺织品销售;体育用品及器材批发;体育用品及器材零售;户外用品销售;珠宝首饰批发;珠宝首饰零售;母婴用品销售;工艺美术品及礼仪用品销售;化妆品批发;化妆品零售;电子产品销售;家用电器销售;箱包销售;服装服饰批发;服装服饰零售;鞋帽零售;鞋帽批发;灯具销售;包装材料及制品销售;玩具;动漫及游艺用品销售;塑料制品销售;礼品花卉销售;农副产品销售;新鲜水果批发;新鲜水果零售;新鲜蔬菜批发;新鲜蔬菜零售;农作物种子经营;仅限不再分装的包装种子;林木种子生产经营;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:蔬菜、食用菌及园艺作物种植;\n",
      "生物农业\n",
      "\n",
      "武汉鑫盛月农业科技有限公司:\n",
      "经营范围:与农业生产经营有关的技术;农副产品销售;木材销售;国内贸易代理;金属材料销售;高品质特种钢铁材料销售;高性能有色金属及合金材料销售;初级农产品收购;食用农产品批发;食用农产品零售;畜牧渔业饲料销售;饲料添加剂销售;金属结构销售;国内货物运输代理;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:蔬菜、食用菌及园艺作物种植;\n",
      "生物农业\n",
      "\n",
      "武汉宇程农业发展有限公司:\n",
      "经营范围:初级农产品;预包装食品;日用百货;办公用品;电子产品的批发兼零售;互联网;计算机网络;农业科技的技术开发;技术咨询;计算机系统集成;网站建设;对宇程生鲜市场物业管理;普通货运代理;会议及展览服务;市场营销策划;企业管理咨询;广告设计;制作;代理及发布;场地租赁;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:蔬菜、食用菌及园艺作物种植;\n",
      "生物农业\n",
      "\n",
      "湖北省绿之邦生态科技有限公司:\n",
      "经营范围:新兴能源技术研发;花卉种植;礼品花卉销售;花卉绿植租借与代管理;城市绿化管理;城市公园管理;生物有机肥料研发;肥料销售;休闲观光活动;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:蔬菜、食用菌及园艺作物种植;\n",
      "生态休闲旅游\n",
      "\n",
      "武汉红越园艺有限公司:\n",
      "经营范围:花卉;植物盆景租赁及批发零售;园林设计及绿化养护;办公用品;日用百货;水果;水产品;园林绿化工具零售及批发;道路普通货物运输;建筑劳务分包;\n",
      "大类名称: 农、林、牧、渔业;\n",
      "中类名称: 农业;\n",
      "小类名称:蔬菜、食用菌及园艺作物种植;\n",
      "生态休闲旅游\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for idx in range(8):\n",
    "    print(BERT_data[idx])\n",
    "    print(raw_attr[bert_infer_data[idx]])\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# BERT 推理结果保存到CSV让用户查看\n",
    "\n",
    "array = []\n",
    "for idx in range(len(bert_infer_data)):\n",
    "    text = BERT_data[idx]\n",
    "    label = raw_attr[bert_infer_data[idx]]\n",
    "    array.append([text, label])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2499748"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# infer_df = pd.DataFrame(array, columns=[\"text\", \"label\"])\n",
    "# infer_df.to_csv(BERT_INFER_CSV, index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !tail -100 $BERT_INFER_CSV"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 转为用户期望格式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_industry_name(text):\n",
    "    t = re.match('(.*?):\\n', text)\n",
    "    assert t is not None\n",
    "    return t.group(1)\n",
    "# extract_industry_name()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "industry_name_label = []\n",
    "for item in array:\n",
    "    text, label = item\n",
    "    industry_name = extract_industry_name(text)\n",
    "    large_cls = get_large_cls_by_name(label)\n",
    "    industry_name_label.append([industry_name, large_cls, label])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['武汉兴祥农业科技发展有限公司', '大健康和生物技术', '生物农业'],\n",
       " ['湖北维绿家环保科贸有限公司', '绿色环保', '先进环保'],\n",
       " ['湖北金元盛田农林科技有限公司', '大健康和生物技术', '生物农业'],\n",
       " ['武汉青熠田农业有限公司', '大健康和生物技术', '生物农业'],\n",
       " ['武汉鑫盛月农业科技有限公司', '大健康和生物技术', '生物农业'],\n",
       " ['武汉宇程农业发展有限公司', '大健康和生物技术', '生物农业'],\n",
       " ['湖北省绿之邦生态科技有限公司', '文化旅游', '生态休闲旅游'],\n",
       " ['武汉红越园艺有限公司', '文化旅游', '生态休闲旅游'],\n",
       " ['武汉市江岸区春颜秋色花卉鲜花店', '文化旅游', '生态休闲旅游'],\n",
       " ['武汉金禾之稼花卉苗圃有限公司', '大健康和生物技术', '生物农业']]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "industry_name_label[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "industry_name_label_df = pd.DataFrame(\n",
    "    industry_name_label, columns=[\"industry_name\", \"965产业门类\", \"细分领域\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['industry_name', '965产业门类', '细分领域'], dtype='object')"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "industry_name_label_df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "industry_name\n",
       "王波                 76\n",
       "王胜红                68\n",
       "张榜言                63\n",
       "尹前                 61\n",
       "张会丽                53\n",
       "                   ..\n",
       "长江联运联营总公司货运部        1\n",
       "中铁物流集团有限公司武汉分公司     1\n",
       "武汉市江汉区荣诚货物运输服务部     1\n",
       "武汉市江汉区华宇货运服务部       1\n",
       "张双胜                 1\n",
       "Name: count, Length: 2460759, dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "industry_name_label_df['industry_name'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2499748, 3)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "industry_name_label_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2499749, 8)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "total_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>industry_name</th>\n",
       "      <th>965产业门类</th>\n",
       "      <th>细分领域</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>武汉兴祥农业科技发展有限公司</td>\n",
       "      <td>大健康和生物技术</td>\n",
       "      <td>生物农业</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>湖北维绿家环保科贸有限公司</td>\n",
       "      <td>绿色环保</td>\n",
       "      <td>先进环保</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    industry_name   965产业门类  细分领域\n",
       "0  武汉兴祥农业科技发展有限公司  大健康和生物技术  生物农业\n",
       "1   湖北维绿家环保科贸有限公司      绿色环保  先进环保"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "industry_name_label_df.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# merge\n",
    "total_df = pd.read_csv(TOTAL_CSV)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2499749, 8)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "total_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['经营范围', '统一社会信用代码', '企业名称', '组织机构代码', '大类名称', '中类名称', '小类名称', '序号编码'], dtype='object')"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "total_df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "merge_df = pd.merge(\n",
    "    total_df,\n",
    "    industry_name_label_df,\n",
    "    left_on=\"企业名称\",\n",
    "    right_on=\"industry_name\",\n",
    "    how=\"left\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2692347, 11)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "merge_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>经营范围</th>\n",
       "      <th>统一社会信用代码</th>\n",
       "      <th>企业名称</th>\n",
       "      <th>组织机构代码</th>\n",
       "      <th>大类名称</th>\n",
       "      <th>中类名称</th>\n",
       "      <th>小类名称</th>\n",
       "      <th>序号编码</th>\n",
       "      <th>industry_name</th>\n",
       "      <th>965产业门类</th>\n",
       "      <th>细分领域</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>从事种养殖业及林果花卉;农业高科技的研究;开发</td>\n",
       "      <td>91420100616433920K</td>\n",
       "      <td>武汉兴祥农业科技发展有限公司</td>\n",
       "      <td>61643392-0</td>\n",
       "      <td>农、林、牧、渔业</td>\n",
       "      <td>农业</td>\n",
       "      <td>谷物种植</td>\n",
       "      <td>1</td>\n",
       "      <td>武汉兴祥农业科技发展有限公司</td>\n",
       "      <td>大健康和生物技术</td>\n",
       "      <td>生物农业</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>测绘服务;建筑劳务分包;环保咨询服务;水土流失防治服务;水利相关咨询服务;水文服务;水资源管...</td>\n",
       "      <td>9142010206680049XQ</td>\n",
       "      <td>湖北维绿家环保科贸有限公司</td>\n",
       "      <td>06680049-X</td>\n",
       "      <td>农、林、牧、渔业</td>\n",
       "      <td>农业</td>\n",
       "      <td>棉、麻、糖、烟草种植</td>\n",
       "      <td>6</td>\n",
       "      <td>湖北维绿家环保科贸有限公司</td>\n",
       "      <td>绿色环保</td>\n",
       "      <td>先进环保</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>中草药种植;茶叶种植;水果种植;技术服务;技术开发;技术咨询;技术交流;技术转让;技术推广;...</td>\n",
       "      <td>91421281MA497T3P3Y</td>\n",
       "      <td>湖北金元盛田农林科技有限公司</td>\n",
       "      <td>MA497T3P-3</td>\n",
       "      <td>农、林、牧、渔业</td>\n",
       "      <td>农业</td>\n",
       "      <td>蔬菜、食用菌及园艺作物种植</td>\n",
       "      <td>11</td>\n",
       "      <td>湖北金元盛田农林科技有限公司</td>\n",
       "      <td>大健康和生物技术</td>\n",
       "      <td>生物农业</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                经营范围            统一社会信用代码  \\\n",
       "0                            从事种养殖业及林果花卉;农业高科技的研究;开发  91420100616433920K   \n",
       "1  测绘服务;建筑劳务分包;环保咨询服务;水土流失防治服务;水利相关咨询服务;水文服务;水资源管...  9142010206680049XQ   \n",
       "2  中草药种植;茶叶种植;水果种植;技术服务;技术开发;技术咨询;技术交流;技术转让;技术推广;...  91421281MA497T3P3Y   \n",
       "\n",
       "             企业名称      组织机构代码       大类名称 中类名称           小类名称  序号编码  \\\n",
       "0  武汉兴祥农业科技发展有限公司  61643392-0   农、林、牧、渔业   农业           谷物种植     1   \n",
       "1   湖北维绿家环保科贸有限公司  06680049-X   农、林、牧、渔业   农业     棉、麻、糖、烟草种植     6   \n",
       "2  湖北金元盛田农林科技有限公司  MA497T3P-3   农、林、牧、渔业   农业  蔬菜、食用菌及园艺作物种植    11   \n",
       "\n",
       "    industry_name   965产业门类  细分领域  \n",
       "0  武汉兴祥农业科技发展有限公司  大健康和生物技术  生物农业  \n",
       "1   湖北维绿家环保科贸有限公司      绿色环保  先进环保  \n",
       "2  湖北金元盛田农林科技有限公司  大健康和生物技术  生物农业  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "merge_df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "tmp_df = merge_df.drop(\"industry_name\", axis=1, inplace=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>经营范围</th>\n",
       "      <th>统一社会信用代码</th>\n",
       "      <th>企业名称</th>\n",
       "      <th>组织机构代码</th>\n",
       "      <th>大类名称</th>\n",
       "      <th>中类名称</th>\n",
       "      <th>小类名称</th>\n",
       "      <th>序号编码</th>\n",
       "      <th>965产业门类</th>\n",
       "      <th>细分领域</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>从事种养殖业及林果花卉;农业高科技的研究;开发</td>\n",
       "      <td>91420100616433920K</td>\n",
       "      <td>武汉兴祥农业科技发展有限公司</td>\n",
       "      <td>61643392-0</td>\n",
       "      <td>农、林、牧、渔业</td>\n",
       "      <td>农业</td>\n",
       "      <td>谷物种植</td>\n",
       "      <td>1</td>\n",
       "      <td>大健康和生物技术</td>\n",
       "      <td>生物农业</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>测绘服务;建筑劳务分包;环保咨询服务;水土流失防治服务;水利相关咨询服务;水文服务;水资源管...</td>\n",
       "      <td>9142010206680049XQ</td>\n",
       "      <td>湖北维绿家环保科贸有限公司</td>\n",
       "      <td>06680049-X</td>\n",
       "      <td>农、林、牧、渔业</td>\n",
       "      <td>农业</td>\n",
       "      <td>棉、麻、糖、烟草种植</td>\n",
       "      <td>6</td>\n",
       "      <td>绿色环保</td>\n",
       "      <td>先进环保</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>中草药种植;茶叶种植;水果种植;技术服务;技术开发;技术咨询;技术交流;技术转让;技术推广;...</td>\n",
       "      <td>91421281MA497T3P3Y</td>\n",
       "      <td>湖北金元盛田农林科技有限公司</td>\n",
       "      <td>MA497T3P-3</td>\n",
       "      <td>农、林、牧、渔业</td>\n",
       "      <td>农业</td>\n",
       "      <td>蔬菜、食用菌及园艺作物种植</td>\n",
       "      <td>11</td>\n",
       "      <td>大健康和生物技术</td>\n",
       "      <td>生物农业</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                经营范围            统一社会信用代码  \\\n",
       "0                            从事种养殖业及林果花卉;农业高科技的研究;开发  91420100616433920K   \n",
       "1  测绘服务;建筑劳务分包;环保咨询服务;水土流失防治服务;水利相关咨询服务;水文服务;水资源管...  9142010206680049XQ   \n",
       "2  中草药种植;茶叶种植;水果种植;技术服务;技术开发;技术咨询;技术交流;技术转让;技术推广;...  91421281MA497T3P3Y   \n",
       "\n",
       "             企业名称      组织机构代码       大类名称 中类名称           小类名称  序号编码   965产业门类  \\\n",
       "0  武汉兴祥农业科技发展有限公司  61643392-0   农、林、牧、渔业   农业           谷物种植     1  大健康和生物技术   \n",
       "1   湖北维绿家环保科贸有限公司  06680049-X   农、林、牧、渔业   农业     棉、麻、糖、烟草种植     6      绿色环保   \n",
       "2  湖北金元盛田农林科技有限公司  MA497T3P-3   农、林、牧、渔业   农业  蔬菜、食用菌及园艺作物种植    11  大健康和生物技术   \n",
       "\n",
       "   细分领域  \n",
       "0  生物农业  \n",
       "1  先进环保  \n",
       "2  生物农业  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tmp_df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "tmp_df.to_csv(\"武汉200万企业分类_utf8.csv\", index=False, encoding=\"utf_8_sig\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llm",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
